Logs are discrete events or records of system activity; Metrics are numerical measurements aggregated over time; and Traces represent end-to-end request paths through distributed systems. Together they provide the three pillars of observability, each serving a distinct purpose.
Logs, metrics, and traces are the three pillars of observability, providing complementary perspectives on system behavior. Logs capture discrete, unstructured events with rich context; metrics aggregate numeric data points for monitoring and alerting; and traces track the flow of requests across distributed services. A truly observable system leverages all three: metrics tell you something is wrong, logs tell you why, and traces tell you where.
What they are: Discrete, timestamped records of individual events or state changes. Each log entry captures specific details about what happened at a particular moment.
Format: Typically unstructured or semi-structured text (JSON, key-value pairs). A log line might be {"timestamp": "2025-03-21T10:00:00Z", "level": "ERROR", "service": "auth", "message": "Login failed", "user": "john@example.com", "error": "invalid credentials"}.
Strengths: Rich contextual detail, good for debugging specific incidents, human-readable, captures unexpected conditions.
Weaknesses: High volume, expensive to store long-term, difficult to aggregate and analyze at scale without tools.
What they are: Numeric measurements of system behavior aggregated over time, stored as time-series data. Examples: request rate, error rate, latency percentiles, CPU usage.
Format: Key-value pairs with timestamps and dimensions (tags) like http_requests_total{method="GET", endpoint="/api/users", status="200"} 15234.
Strengths: Low overhead, ideal for dashboards and alerts, efficient long-term storage, enables statistical analysis (averages, percentiles, rates).
Weaknesses: Lose detailed context, only tell you that something is wrong, not why.
What they are: End-to-end representations of a request as it travels through a distributed system. A trace is composed of spans—individual operations (database queries, API calls, computations) within the request path.
Format: Span data includes operation name, start/end time, parent-child relationships, and associated metadata (attributes, events, errors).
Strengths: Shows request flow across services, identifies bottlenecks, reveals latency breakdowns, maps service dependencies.
Weaknesses: Requires instrumentation of all services, sampling needed for high-volume systems, complex to implement.
Alert: Metric-based alert triggers (e.g., latency > 2s)
Discover: Use metrics dashboard to see which service is impacted
Explore: Sample traces to find slow requests and identify problematic operations
Debug: Inspect logs for that specific trace to see error messages, query details, or application state
Resolve: Fix root cause; metrics confirm recovery
Modern observability platforms unify these three pillars. Honeycomb and Lightstep emphasize high-cardinality events, treating logs and traces as first-class citizens. Datadog and New Relic provide integrated dashboards across metrics, logs, and traces. The OpenTelemetry project provides a unified standard for generating and collecting telemetry data across all three pillars, enabling vendor-neutral instrumentation.